Overview

Dataset statistics

Number of variables21
Number of observations3312
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory543.5 KiB
Average record size in memory168.0 B

Variable types

Numeric6
Categorical14
DateTime1

Alerts

Country has constant value "United States" Constant
Order ID has a high cardinality: 1687 distinct values High cardinality
Customer ID has a high cardinality: 693 distinct values High cardinality
Customer Name has a high cardinality: 693 distinct values High cardinality
City has a high cardinality: 350 distinct values High cardinality
Product ID has a high cardinality: 1525 distinct values High cardinality
Product Name has a high cardinality: 1511 distinct values High cardinality
Sales is highly correlated with ProfitHigh correlation
Discount is highly correlated with ProfitHigh correlation
Profit is highly correlated with Sales and 1 other fieldsHigh correlation
Sales is highly correlated with ProfitHigh correlation
Profit is highly correlated with SalesHigh correlation
month_year is highly correlated with CountryHigh correlation
Segment is highly correlated with CountryHigh correlation
Ship Mode is highly correlated with CountryHigh correlation
Country is highly correlated with month_year and 6 other fieldsHigh correlation
Region is highly correlated with Country and 1 other fieldsHigh correlation
State is highly correlated with Country and 1 other fieldsHigh correlation
Sub-Category is highly correlated with Country and 1 other fieldsHigh correlation
Category is highly correlated with Country and 1 other fieldsHigh correlation
State is highly correlated with Postal Code and 2 other fieldsHigh correlation
Postal Code is highly correlated with State and 2 other fieldsHigh correlation
Region is highly correlated with State and 1 other fieldsHigh correlation
Category is highly correlated with Sub-Category and 1 other fieldsHigh correlation
Sub-Category is highly correlated with Category and 1 other fieldsHigh correlation
Sales is highly correlated with ProfitHigh correlation
Discount is highly correlated with State and 3 other fieldsHigh correlation
Profit is highly correlated with SalesHigh correlation
Product ID is uniformly distributed Uniform
Product Name is uniformly distributed Uniform
Row ID has unique values Unique
Discount has 1590 (48.0%) zeros Zeros

Reproduction

Analysis started2022-03-16 18:05:03.487870
Analysis finished2022-03-16 18:07:05.998742
Duration2 minutes and 2.51 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

Row ID
Real number (ℝ≥0)

UNIQUE

Distinct3312
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5087.107488
Minimum13
Maximum9994
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2022-03-16T19:07:06.814281image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile643.55
Q12655.75
median5183.5
Q37498.25
95-th percentile9471.45
Maximum9994
Range9981
Interquartile range (IQR)4842.5

Descriptive statistics

Standard deviation2817.482266
Coefficient of variation (CV)0.5538475986
Kurtosis-1.180921704
Mean5087.107488
Median Absolute Deviation (MAD)2409.5
Skewness-0.01617289766
Sum16848500
Variance7938206.321
MonotonicityStrictly increasing
2022-03-16T19:07:07.127612image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
131
 
< 0.1%
66941
 
< 0.1%
66711
 
< 0.1%
66801
 
< 0.1%
66811
 
< 0.1%
66821
 
< 0.1%
66831
 
< 0.1%
66891
 
< 0.1%
66901
 
< 0.1%
66911
 
< 0.1%
Other values (3302)3302
99.7%
ValueCountFrequency (%)
131
< 0.1%
241
< 0.1%
351
< 0.1%
421
< 0.1%
441
< 0.1%
721
< 0.1%
761
< 0.1%
771
< 0.1%
781
< 0.1%
851
< 0.1%
ValueCountFrequency (%)
99941
< 0.1%
99931
< 0.1%
99921
< 0.1%
99911
< 0.1%
99891
< 0.1%
99881
< 0.1%
99821
< 0.1%
99701
< 0.1%
99691
< 0.1%
99681
< 0.1%

Order ID
Categorical

HIGH CARDINALITY

Distinct1687
Distinct (%)50.9%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
CA-2017-100111
 
14
CA-2017-157987
 
12
CA-2017-140949
 
9
CA-2017-117457
 
9
CA-2017-156776
 
8
Other values (1682)
3260 

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters46368
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique875 ?
Unique (%)26.4%

Sample

1st rowCA-2017-114412
2nd rowUS-2017-156909
3rd rowCA-2017-107727
4th rowCA-2017-120999
5th rowCA-2017-139619

Common Values

ValueCountFrequency (%)
CA-2017-10011114
 
0.4%
CA-2017-15798712
 
0.4%
CA-2017-1409499
 
0.3%
CA-2017-1174579
 
0.3%
CA-2017-1567768
 
0.2%
CA-2017-1180178
 
0.2%
CA-2017-1408728
 
0.2%
CA-2017-1029258
 
0.2%
CA-2017-1109058
 
0.2%
CA-2017-1132788
 
0.2%
Other values (1677)3220
97.2%

Length

2022-03-16T19:07:07.952718image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca-2017-10011114
 
0.4%
ca-2017-15798712
 
0.4%
ca-2017-1409499
 
0.3%
ca-2017-1174579
 
0.3%
ca-2017-1109058
 
0.2%
us-2017-1180878
 
0.2%
ca-2017-1619568
 
0.2%
ca-2017-1647568
 
0.2%
ca-2017-1132788
 
0.2%
ca-2017-1029258
 
0.2%
Other values (1677)3220
97.2%

Most occurring characters

ValueCountFrequency (%)
18448
18.2%
-6624
14.3%
05199
11.2%
25169
11.1%
74664
10.1%
C2732
 
5.9%
A2732
 
5.9%
41809
 
3.9%
61769
 
3.8%
31739
 
3.8%
Other values (5)5483
11.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33120
71.4%
Dash Punctuation6624
 
14.3%
Uppercase Letter6624
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18448
25.5%
05199
15.7%
25169
15.6%
74664
14.1%
41809
 
5.5%
61769
 
5.3%
31739
 
5.3%
51673
 
5.1%
81344
 
4.1%
91306
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
C2732
41.2%
A2732
41.2%
U580
 
8.8%
S580
 
8.8%
Dash Punctuation
ValueCountFrequency (%)
-6624
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common39744
85.7%
Latin6624
 
14.3%

Most frequent character per script

Common
ValueCountFrequency (%)
18448
21.3%
-6624
16.7%
05199
13.1%
25169
13.0%
74664
11.7%
41809
 
4.6%
61769
 
4.5%
31739
 
4.4%
51673
 
4.2%
81344
 
3.4%
Latin
ValueCountFrequency (%)
C2732
41.2%
A2732
41.2%
U580
 
8.8%
S580
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII46368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18448
18.2%
-6624
14.3%
05199
11.2%
25169
11.1%
74664
10.1%
C2732
 
5.9%
A2732
 
5.9%
41809
 
3.9%
61769
 
3.8%
31739
 
3.8%
Other values (5)5483
11.8%
Distinct322
Distinct (%)9.7%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Minimum2020-01-01 00:00:00
Maximum2020-12-30 00:00:00
2022-03-16T19:07:08.316313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:07:08.725059image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Ship Mode
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Standard Class
1897 
Second Class
657 
First Class
572 
Same Day
 
186

Length

Max length14
Median length14
Mean length12.74818841
Min length8

Characters and Unicode

Total characters42222
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowStandard Class
2nd rowSecond Class
3rd rowSecond Class
4th rowStandard Class
5th rowStandard Class

Common Values

ValueCountFrequency (%)
Standard Class1897
57.3%
Second Class657
 
19.8%
First Class572
 
17.3%
Same Day186
 
5.6%

Length

2022-03-16T19:07:09.499581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-16T19:07:09.778404image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
class3126
47.2%
standard1897
28.6%
second657
 
9.9%
first572
 
8.6%
same186
 
2.8%
day186
 
2.8%

Most occurring characters

ValueCountFrequency (%)
a7292
17.3%
s6824
16.2%
d4451
10.5%
3312
7.8%
l3126
7.4%
C3126
7.4%
S2740
 
6.5%
n2554
 
6.0%
r2469
 
5.8%
t2469
 
5.8%
Other values (8)3859
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter32286
76.5%
Uppercase Letter6624
 
15.7%
Space Separator3312
 
7.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a7292
22.6%
s6824
21.1%
d4451
13.8%
l3126
9.7%
n2554
 
7.9%
r2469
 
7.6%
t2469
 
7.6%
e843
 
2.6%
c657
 
2.0%
o657
 
2.0%
Other values (3)944
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
C3126
47.2%
S2740
41.4%
F572
 
8.6%
D186
 
2.8%
Space Separator
ValueCountFrequency (%)
3312
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin38910
92.2%
Common3312
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a7292
18.7%
s6824
17.5%
d4451
11.4%
l3126
8.0%
C3126
8.0%
S2740
 
7.0%
n2554
 
6.6%
r2469
 
6.3%
t2469
 
6.3%
e843
 
2.2%
Other values (7)3016
7.8%
Common
ValueCountFrequency (%)
3312
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a7292
17.3%
s6824
16.2%
d4451
10.5%
3312
7.8%
l3126
7.4%
C3126
7.4%
S2740
 
6.5%
n2554
 
6.0%
r2469
 
5.8%
t2469
 
5.8%
Other values (8)3859
9.1%

Customer ID
Categorical

HIGH CARDINALITY

Distinct693
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
SV-20365
 
20
JL-15835
 
20
Dp-13240
 
19
MH-18115
 
19
LC-16870
 
17
Other values (688)
3217 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters26496
Distinct characters39
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)2.9%

Sample

1st rowAA-10480
2nd rowSF-20065
3rd rowMA-17560
4th rowLC-16930
5th rowES-14080

Common Values

ValueCountFrequency (%)
SV-2036520
 
0.6%
JL-1583520
 
0.6%
Dp-1324019
 
0.6%
MH-1811519
 
0.6%
LC-1687017
 
0.5%
SS-2014016
 
0.5%
AC-1061516
 
0.5%
JM-1525015
 
0.5%
EP-1391515
 
0.5%
DS-1303015
 
0.5%
Other values (683)3140
94.8%

Length

2022-03-16T19:07:10.020256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sv-2036520
 
0.6%
jl-1583520
 
0.6%
dp-1324019
 
0.6%
mh-1811519
 
0.6%
lc-1687017
 
0.5%
ss-2014016
 
0.5%
ac-1061516
 
0.5%
jm-1525015
 
0.5%
ep-1391515
 
0.5%
ds-1303015
 
0.5%
Other values (683)3140
94.8%

Most occurring characters

ValueCountFrequency (%)
14014
15.1%
-3312
12.5%
02857
 
10.8%
52602
 
9.8%
21533
 
5.8%
8998
 
3.8%
3993
 
3.7%
6921
 
3.5%
9890
 
3.4%
4878
 
3.3%
Other values (29)7498
28.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16560
62.5%
Uppercase Letter6603
 
24.9%
Dash Punctuation3312
 
12.5%
Lowercase Letter21
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S610
 
9.2%
C574
 
8.7%
B552
 
8.4%
M528
 
8.0%
D465
 
7.0%
J409
 
6.2%
A385
 
5.8%
H343
 
5.2%
P330
 
5.0%
R299
 
4.5%
Other values (16)2108
31.9%
Decimal Number
ValueCountFrequency (%)
14014
24.2%
02857
17.3%
52602
15.7%
21533
 
9.3%
8998
 
6.0%
3993
 
6.0%
6921
 
5.6%
9890
 
5.4%
4878
 
5.3%
7874
 
5.3%
Lowercase Letter
ValueCountFrequency (%)
p19
90.5%
l2
 
9.5%
Dash Punctuation
ValueCountFrequency (%)
-3312
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common19872
75.0%
Latin6624
 
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S610
 
9.2%
C574
 
8.7%
B552
 
8.3%
M528
 
8.0%
D465
 
7.0%
J409
 
6.2%
A385
 
5.8%
H343
 
5.2%
P330
 
5.0%
R299
 
4.5%
Other values (18)2129
32.1%
Common
ValueCountFrequency (%)
14014
20.2%
-3312
16.7%
02857
14.4%
52602
13.1%
21533
 
7.7%
8998
 
5.0%
3993
 
5.0%
6921
 
4.6%
9890
 
4.5%
4878
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII26496
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14014
15.1%
-3312
12.5%
02857
 
10.8%
52602
 
9.8%
21533
 
5.8%
8998
 
3.8%
3993
 
3.7%
6921
 
3.5%
9890
 
3.4%
4878
 
3.3%
Other values (29)7498
28.3%

Customer Name
Categorical

HIGH CARDINALITY

Distinct693
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Seth Vernon
 
20
John Lee
 
20
Dean percer
 
19
Mick Hernandez
 
19
Lena Cacioppo
 
17
Other values (688)
3217 

Length

Max length22
Median length13
Mean length12.9794686
Min length7

Characters and Unicode

Total characters42988
Distinct characters56
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)2.9%

Sample

1st rowAndrew Allen
2nd rowSandra Flanagan
3rd rowMatt Abelman
4th rowLinda Cazamias
5th rowErin Smith

Common Values

ValueCountFrequency (%)
Seth Vernon20
 
0.6%
John Lee20
 
0.6%
Dean percer19
 
0.6%
Mick Hernandez19
 
0.6%
Lena Cacioppo17
 
0.5%
Saphhira Shifley16
 
0.5%
Ann Chong16
 
0.5%
Janet Martin15
 
0.5%
Emily Phan15
 
0.5%
Darrin Sayre15
 
0.5%
Other values (683)3140
94.8%

Length

2022-03-16T19:07:10.440996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
frank49
 
0.7%
patrick42
 
0.6%
john41
 
0.6%
michael39
 
0.6%
ann38
 
0.6%
bill36
 
0.5%
alan34
 
0.5%
rick33
 
0.5%
mick31
 
0.5%
dean30
 
0.5%
Other values (829)6281
94.4%

Most occurring characters

ValueCountFrequency (%)
a3965
 
9.2%
e3905
 
9.1%
n3495
 
8.1%
3342
 
7.8%
r3111
 
7.2%
i2632
 
6.1%
l2124
 
4.9%
o2024
 
4.7%
t1738
 
4.0%
s1541
 
3.6%
Other values (46)15111
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter32760
76.2%
Uppercase Letter6814
 
15.9%
Space Separator3342
 
7.8%
Other Punctuation62
 
0.1%
Dash Punctuation10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a3965
12.1%
e3905
11.9%
n3495
10.7%
r3111
9.5%
i2632
 
8.0%
l2124
 
6.5%
o2024
 
6.2%
t1738
 
5.3%
s1541
 
4.7%
h1306
 
4.0%
Other values (17)6919
21.1%
Uppercase Letter
ValueCountFrequency (%)
C620
 
9.1%
S610
 
9.0%
B573
 
8.4%
M550
 
8.1%
D482
 
7.1%
J409
 
6.0%
A400
 
5.9%
H358
 
5.3%
P330
 
4.8%
R310
 
4.5%
Other values (16)2172
31.9%
Space Separator
ValueCountFrequency (%)
3342
100.0%
Other Punctuation
ValueCountFrequency (%)
'62
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin39574
92.1%
Common3414
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a3965
 
10.0%
e3905
 
9.9%
n3495
 
8.8%
r3111
 
7.9%
i2632
 
6.7%
l2124
 
5.4%
o2024
 
5.1%
t1738
 
4.4%
s1541
 
3.9%
h1306
 
3.3%
Other values (43)13733
34.7%
Common
ValueCountFrequency (%)
3342
97.9%
'62
 
1.8%
-10
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII42967
> 99.9%
None21
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a3965
 
9.2%
e3905
 
9.1%
n3495
 
8.1%
3342
 
7.8%
r3111
 
7.2%
i2632
 
6.1%
l2124
 
4.9%
o2024
 
4.7%
t1738
 
4.0%
s1541
 
3.6%
Other values (44)15090
35.1%
None
ValueCountFrequency (%)
ö18
85.7%
ü3
 
14.3%

Segment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Consumer
1668 
Corporate
980 
Home Office
664 

Length

Max length11
Median length8
Mean length8.897342995
Min length8

Characters and Unicode

Total characters29468
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowConsumer
2nd rowConsumer
3rd rowHome Office
4th rowCorporate
5th rowCorporate

Common Values

ValueCountFrequency (%)
Consumer1668
50.4%
Corporate980
29.6%
Home Office664
 
20.0%

Length

2022-03-16T19:07:10.782784image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-16T19:07:10.997651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
consumer1668
42.0%
corporate980
24.6%
home664
 
16.7%
office664
 
16.7%

Most occurring characters

ValueCountFrequency (%)
o4292
14.6%
e3976
13.5%
r3628
12.3%
C2648
9.0%
m2332
7.9%
n1668
 
5.7%
s1668
 
5.7%
u1668
 
5.7%
f1328
 
4.5%
t980
 
3.3%
Other values (7)5280
17.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter24828
84.3%
Uppercase Letter3976
 
13.5%
Space Separator664
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o4292
17.3%
e3976
16.0%
r3628
14.6%
m2332
9.4%
n1668
 
6.7%
s1668
 
6.7%
u1668
 
6.7%
f1328
 
5.3%
t980
 
3.9%
p980
 
3.9%
Other values (3)2308
9.3%
Uppercase Letter
ValueCountFrequency (%)
C2648
66.6%
H664
 
16.7%
O664
 
16.7%
Space Separator
ValueCountFrequency (%)
664
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin28804
97.7%
Common664
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o4292
14.9%
e3976
13.8%
r3628
12.6%
C2648
9.2%
m2332
8.1%
n1668
 
5.8%
s1668
 
5.8%
u1668
 
5.8%
f1328
 
4.6%
t980
 
3.4%
Other values (6)4616
16.0%
Common
ValueCountFrequency (%)
664
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII29468
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o4292
14.6%
e3976
13.5%
r3628
12.3%
C2648
9.0%
m2332
7.9%
n1668
 
5.7%
s1668
 
5.7%
u1668
 
5.7%
f1328
 
4.5%
t980
 
3.3%
Other values (7)5280
17.9%

Country
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
United States
3312 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters43056
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnited States
2nd rowUnited States
3rd rowUnited States
4th rowUnited States
5th rowUnited States

Common Values

ValueCountFrequency (%)
United States3312
100.0%

Length

2022-03-16T19:07:11.262485image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-16T19:07:11.422135image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
united3312
50.0%
states3312
50.0%

Most occurring characters

ValueCountFrequency (%)
t9936
23.1%
e6624
15.4%
U3312
 
7.7%
n3312
 
7.7%
i3312
 
7.7%
d3312
 
7.7%
3312
 
7.7%
S3312
 
7.7%
a3312
 
7.7%
s3312
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter33120
76.9%
Uppercase Letter6624
 
15.4%
Space Separator3312
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t9936
30.0%
e6624
20.0%
n3312
 
10.0%
i3312
 
10.0%
d3312
 
10.0%
a3312
 
10.0%
s3312
 
10.0%
Uppercase Letter
ValueCountFrequency (%)
U3312
50.0%
S3312
50.0%
Space Separator
ValueCountFrequency (%)
3312
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin39744
92.3%
Common3312
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t9936
25.0%
e6624
16.7%
U3312
 
8.3%
n3312
 
8.3%
i3312
 
8.3%
d3312
 
8.3%
S3312
 
8.3%
a3312
 
8.3%
s3312
 
8.3%
Common
ValueCountFrequency (%)
3312
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII43056
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t9936
23.1%
e6624
15.4%
U3312
 
7.7%
n3312
 
7.7%
i3312
 
7.7%
d3312
 
7.7%
3312
 
7.7%
S3312
 
7.7%
a3312
 
7.7%
s3312
 
7.7%

City
Categorical

HIGH CARDINALITY

Distinct350
Distinct (%)10.6%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
New York City
306 
Los Angeles
 
210
San Francisco
 
190
Seattle
 
182
Philadelphia
 
182
Other values (345)
2242 

Length

Max length16
Median length9
Mean length9.317934783
Min length4

Characters and Unicode

Total characters30861
Distinct characters50
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)2.6%

Sample

1st rowConcord
2nd rowPhiladelphia
3rd rowHouston
4th rowNaperville
5th rowMelbourne

Common Values

ValueCountFrequency (%)
New York City306
 
9.2%
Los Angeles210
 
6.3%
San Francisco190
 
5.7%
Seattle182
 
5.5%
Philadelphia182
 
5.5%
Chicago114
 
3.4%
Houston104
 
3.1%
Columbus82
 
2.5%
Dallas70
 
2.1%
Jacksonville45
 
1.4%
Other values (340)1827
55.2%

Length

2022-03-16T19:07:11.615125image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city327
 
7.1%
new314
 
6.8%
york308
 
6.7%
san246
 
5.3%
los210
 
4.5%
angeles210
 
4.5%
francisco190
 
4.1%
seattle182
 
3.9%
philadelphia182
 
3.9%
chicago114
 
2.5%
Other values (371)2340
50.6%

Most occurring characters

ValueCountFrequency (%)
e2941
 
9.5%
a2553
 
8.3%
o2391
 
7.7%
l2098
 
6.8%
i2094
 
6.8%
n1997
 
6.5%
t1522
 
4.9%
s1508
 
4.9%
r1446
 
4.7%
1311
 
4.2%
Other values (40)11000
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter24927
80.8%
Uppercase Letter4623
 
15.0%
Space Separator1311
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2941
11.8%
a2553
10.2%
o2391
9.6%
l2098
 
8.4%
i2094
 
8.4%
n1997
 
8.0%
t1522
 
6.1%
s1508
 
6.0%
r1446
 
5.8%
c844
 
3.4%
Other values (15)5533
22.2%
Uppercase Letter
ValueCountFrequency (%)
C707
15.3%
S586
12.7%
L392
8.5%
N365
7.9%
A347
 
7.5%
P340
 
7.4%
Y313
 
6.8%
F311
 
6.7%
D188
 
4.1%
M174
 
3.8%
Other values (14)900
19.5%
Space Separator
ValueCountFrequency (%)
1311
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin29550
95.8%
Common1311
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2941
 
10.0%
a2553
 
8.6%
o2391
 
8.1%
l2098
 
7.1%
i2094
 
7.1%
n1997
 
6.8%
t1522
 
5.2%
s1508
 
5.1%
r1446
 
4.9%
c844
 
2.9%
Other values (39)10156
34.4%
Common
ValueCountFrequency (%)
1311
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII30861
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2941
 
9.5%
a2553
 
8.3%
o2391
 
7.7%
l2098
 
6.8%
i2094
 
6.8%
n1997
 
6.5%
t1522
 
4.9%
s1508
 
4.9%
r1446
 
4.7%
1311
 
4.2%
Other values (40)11000
35.6%

State
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct47
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
California
663 
New York
352 
Texas
317 
Washington
215 
Pennsylvania
197 
Other values (42)
1568 

Length

Max length20
Median length8
Mean length8.538949275
Min length4

Characters and Unicode

Total characters28281
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorth Carolina
2nd rowPennsylvania
3rd rowTexas
4th rowIllinois
5th rowFlorida

Common Values

ValueCountFrequency (%)
California663
20.0%
New York352
 
10.6%
Texas317
 
9.6%
Washington215
 
6.5%
Pennsylvania197
 
5.9%
Illinois172
 
5.2%
Ohio161
 
4.9%
Florida126
 
3.8%
North Carolina85
 
2.6%
Tennessee81
 
2.4%
Other values (37)943
28.5%

Length

2022-03-16T19:07:11.935928image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california663
17.1%
new424
 
10.9%
york352
 
9.1%
texas317
 
8.2%
washington215
 
5.6%
pennsylvania197
 
5.1%
illinois172
 
4.4%
ohio161
 
4.2%
florida126
 
3.3%
carolina93
 
2.4%
Other values (41)1153
29.8%

Most occurring characters

ValueCountFrequency (%)
a3522
12.5%
i3235
11.4%
n2820
 
10.0%
o2493
 
8.8%
r1764
 
6.2%
e1715
 
6.1%
l1596
 
5.6%
s1578
 
5.6%
C853
 
3.0%
f665
 
2.4%
Other values (36)8040
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23849
84.3%
Uppercase Letter3871
 
13.7%
Space Separator561
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a3522
14.8%
i3235
13.6%
n2820
11.8%
o2493
10.5%
r1764
7.4%
e1715
7.2%
l1596
6.7%
s1578
6.6%
f665
 
2.8%
h658
 
2.8%
Other values (14)3803
15.9%
Uppercase Letter
ValueCountFrequency (%)
C853
22.0%
N532
13.7%
T398
10.3%
Y352
9.1%
I278
 
7.2%
W252
 
6.5%
M247
 
6.4%
O208
 
5.4%
P197
 
5.1%
F126
 
3.3%
Other values (11)428
11.1%
Space Separator
ValueCountFrequency (%)
561
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin27720
98.0%
Common561
 
2.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a3522
12.7%
i3235
11.7%
n2820
 
10.2%
o2493
 
9.0%
r1764
 
6.4%
e1715
 
6.2%
l1596
 
5.8%
s1578
 
5.7%
C853
 
3.1%
f665
 
2.4%
Other values (35)7479
27.0%
Common
ValueCountFrequency (%)
561
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII28281
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a3522
12.5%
i3235
11.4%
n2820
 
10.0%
o2493
 
8.8%
r1764
 
6.2%
e1715
 
6.1%
l1596
 
5.6%
s1578
 
5.6%
C853
 
3.0%
f665
 
2.4%
Other values (36)8040
28.4%

Postal Code
Real number (ℝ≥0)

HIGH CORRELATION

Distinct437
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56186.5151
Minimum1841
Maximum99301
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2022-03-16T19:07:12.335680image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1841
5-th percentile10009
Q127978.75
median60472.5
Q390032
95-th percentile98103
Maximum99301
Range97460
Interquartile range (IQR)62053.25

Descriptive statistics

Standard deviation31980.37552
Coefficient of variation (CV)0.5691824001
Kurtosis-1.459212056
Mean56186.5151
Median Absolute Deviation (MAD)29576.5
Skewness-0.1625508912
Sum186089738
Variance1022744418
MonotonicityNot monotonic
2022-03-16T19:07:12.728434image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1003589
 
2.7%
1000981
 
2.4%
1002473
 
2.2%
9810571
 
2.1%
9412270
 
2.1%
9411067
 
2.0%
1001163
 
1.9%
9810363
 
1.9%
9410953
 
1.6%
1914052
 
1.6%
Other values (427)2630
79.4%
ValueCountFrequency (%)
184112
0.4%
18527
0.2%
20382
 
0.1%
21382
 
0.1%
21499
0.3%
21693
 
0.1%
27404
 
0.1%
28863
 
0.1%
28952
 
0.1%
29087
0.2%
ValueCountFrequency (%)
993012
 
0.1%
992074
 
0.1%
986611
 
< 0.1%
986322
 
0.1%
985022
 
0.1%
982263
 
0.1%
982081
 
< 0.1%
9811548
1.4%
9810571
2.1%
9810363
1.9%

Region
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
West
1095 
East
921 
Central
778 
South
518 

Length

Max length7
Median length4
Mean length4.861111111
Min length4

Characters and Unicode

Total characters16100
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSouth
2nd rowEast
3rd rowCentral
4th rowCentral
5th rowSouth

Common Values

ValueCountFrequency (%)
West1095
33.1%
East921
27.8%
Central778
23.5%
South518
15.6%

Length

2022-03-16T19:07:13.030168image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-16T19:07:13.264418image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
west1095
33.1%
east921
27.8%
central778
23.5%
south518
15.6%

Most occurring characters

ValueCountFrequency (%)
t3312
20.6%
s2016
12.5%
e1873
11.6%
a1699
10.6%
W1095
 
6.8%
E921
 
5.7%
C778
 
4.8%
n778
 
4.8%
r778
 
4.8%
l778
 
4.8%
Other values (4)2072
12.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12788
79.4%
Uppercase Letter3312
 
20.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t3312
25.9%
s2016
15.8%
e1873
14.6%
a1699
13.3%
n778
 
6.1%
r778
 
6.1%
l778
 
6.1%
o518
 
4.1%
u518
 
4.1%
h518
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
W1095
33.1%
E921
27.8%
C778
23.5%
S518
15.6%

Most occurring scripts

ValueCountFrequency (%)
Latin16100
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
t3312
20.6%
s2016
12.5%
e1873
11.6%
a1699
10.6%
W1095
 
6.8%
E921
 
5.7%
C778
 
4.8%
n778
 
4.8%
r778
 
4.8%
l778
 
4.8%
Other values (4)2072
12.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII16100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t3312
20.6%
s2016
12.5%
e1873
11.6%
a1699
10.6%
W1095
 
6.8%
E921
 
5.7%
C778
 
4.8%
n778
 
4.8%
r778
 
4.8%
l778
 
4.8%
Other values (4)2072
12.9%

Product ID
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1525
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
FUR-CH-10003774
 
8
OFF-ST-10001325
 
7
TEC-AC-10003832
 
7
OFF-BI-10004632
 
7
OFF-PA-10003673
 
7
Other values (1520)
3276 

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters49680
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique554 ?
Unique (%)16.7%

Sample

1st rowOFF-PA-10002365
2nd rowFUR-CH-10002774
3rd rowOFF-PA-10000249
4th rowTEC-PH-10004093
5th rowOFF-ST-10003282

Common Values

ValueCountFrequency (%)
FUR-CH-100037748
 
0.2%
OFF-ST-100013257
 
0.2%
TEC-AC-100038327
 
0.2%
OFF-BI-100046327
 
0.2%
OFF-PA-100036737
 
0.2%
OFF-ST-100032087
 
0.2%
OFF-PA-100019707
 
0.2%
TEC-AC-100045107
 
0.2%
OFF-BI-100032746
 
0.2%
FUR-TA-100015206
 
0.2%
Other values (1515)3243
97.9%

Length

2022-03-16T19:07:13.488639image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fur-ch-100037748
 
0.2%
off-st-100032087
 
0.2%
tec-ac-100045107
 
0.2%
off-pa-100019707
 
0.2%
off-st-100013257
 
0.2%
off-pa-100036737
 
0.2%
tec-ac-100038327
 
0.2%
off-bi-100046327
 
0.2%
off-bi-100020126
 
0.2%
off-st-100006156
 
0.2%
Other values (1515)3243
97.9%

Most occurring characters

ValueCountFrequency (%)
011602
23.4%
-6624
13.3%
F5070
10.2%
15029
10.1%
O2100
 
4.2%
41644
 
3.3%
31608
 
3.2%
21598
 
3.2%
A1496
 
3.0%
C1111
 
2.2%
Other values (17)11798
23.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number26496
53.3%
Uppercase Letter16560
33.3%
Dash Punctuation6624
 
13.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F5070
30.6%
O2100
12.7%
A1496
 
9.0%
C1111
 
6.7%
U1061
 
6.4%
T1016
 
6.1%
R968
 
5.8%
P918
 
5.5%
E695
 
4.2%
B576
 
3.5%
Other values (6)1549
 
9.4%
Decimal Number
ValueCountFrequency (%)
011602
43.8%
15029
19.0%
41644
 
6.2%
31608
 
6.1%
21598
 
6.0%
51094
 
4.1%
71018
 
3.8%
9978
 
3.7%
6976
 
3.7%
8949
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
-6624
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common33120
66.7%
Latin16560
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
F5070
30.6%
O2100
12.7%
A1496
 
9.0%
C1111
 
6.7%
U1061
 
6.4%
T1016
 
6.1%
R968
 
5.8%
P918
 
5.5%
E695
 
4.2%
B576
 
3.5%
Other values (6)1549
 
9.4%
Common
ValueCountFrequency (%)
011602
35.0%
-6624
20.0%
15029
15.2%
41644
 
5.0%
31608
 
4.9%
21598
 
4.8%
51094
 
3.3%
71018
 
3.1%
9978
 
3.0%
6976
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII49680
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
011602
23.4%
-6624
13.3%
F5070
10.2%
15029
10.1%
O2100
 
4.2%
41644
 
3.3%
31608
 
3.2%
21598
 
3.2%
A1496
 
3.0%
C1111
 
2.2%
Other values (17)11798
23.7%

Category
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Office Supplies
2002 
Furniture
686 
Technology
624 

Length

Max length15
Median length15
Mean length12.81521739
Min length9

Characters and Unicode

Total characters42444
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOffice Supplies
2nd rowFurniture
3rd rowOffice Supplies
4th rowTechnology
5th rowOffice Supplies

Common Values

ValueCountFrequency (%)
Office Supplies2002
60.4%
Furniture686
 
20.7%
Technology624
 
18.8%

Length

2022-03-16T19:07:13.807337image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-16T19:07:14.010921image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
office2002
37.7%
supplies2002
37.7%
furniture686
 
12.9%
technology624
 
11.7%

Most occurring characters

ValueCountFrequency (%)
e5314
12.5%
i4690
11.0%
p4004
9.4%
f4004
9.4%
u3374
 
7.9%
c2626
 
6.2%
l2626
 
6.2%
O2002
 
4.7%
s2002
 
4.7%
S2002
 
4.7%
Other values (10)9800
23.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter35128
82.8%
Uppercase Letter5314
 
12.5%
Space Separator2002
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e5314
15.1%
i4690
13.4%
p4004
11.4%
f4004
11.4%
u3374
9.6%
c2626
7.5%
l2626
7.5%
s2002
 
5.7%
r1372
 
3.9%
n1310
 
3.7%
Other values (5)3806
10.8%
Uppercase Letter
ValueCountFrequency (%)
O2002
37.7%
S2002
37.7%
F686
 
12.9%
T624
 
11.7%
Space Separator
ValueCountFrequency (%)
2002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin40442
95.3%
Common2002
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e5314
13.1%
i4690
11.6%
p4004
9.9%
f4004
9.9%
u3374
8.3%
c2626
 
6.5%
l2626
 
6.5%
O2002
 
5.0%
s2002
 
5.0%
S2002
 
5.0%
Other values (9)7798
19.3%
Common
ValueCountFrequency (%)
2002
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e5314
12.5%
i4690
11.0%
p4004
9.4%
f4004
9.4%
u3374
 
7.9%
c2626
 
6.2%
l2626
 
6.2%
O2002
 
4.7%
s2002
 
4.7%
S2002
 
4.7%
Other values (10)9800
23.1%

Sub-Category
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Binders
500 
Paper
459 
Furnishings
316 
Phones
294 
Storage
288 
Other values (12)
1455 

Length

Max length11
Median length7
Mean length7.188707729
Min length3

Characters and Unicode

Total characters23809
Distinct characters28
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPaper
2nd rowChairs
3rd rowPaper
4th rowPhones
5th rowStorage

Common Values

ValueCountFrequency (%)
Binders500
15.1%
Paper459
13.9%
Furnishings316
9.5%
Phones294
8.9%
Storage288
8.7%
Art282
8.5%
Accessories275
8.3%
Chairs190
 
5.7%
Appliances165
 
5.0%
Labels114
 
3.4%
Other values (7)429
13.0%

Length

2022-03-16T19:07:14.296674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
binders500
15.1%
paper459
13.9%
furnishings316
9.5%
phones294
8.9%
storage288
8.7%
art282
8.5%
accessories275
8.3%
chairs190
 
5.7%
appliances165
 
5.0%
labels114
 
3.4%
Other values (7)429
13.0%

Most occurring characters

ValueCountFrequency (%)
s3289
13.8%
e2934
12.3%
r2396
 
10.1%
i1876
 
7.9%
n1759
 
7.4%
a1493
 
6.3%
o1102
 
4.6%
p1000
 
4.2%
h833
 
3.5%
c824
 
3.5%
Other values (18)6303
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter20497
86.1%
Uppercase Letter3312
 
13.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s3289
16.0%
e2934
14.3%
r2396
11.7%
i1876
9.2%
n1759
8.6%
a1493
7.3%
o1102
 
5.4%
p1000
 
4.9%
h833
 
4.1%
c824
 
4.0%
Other values (8)2991
14.6%
Uppercase Letter
ValueCountFrequency (%)
P753
22.7%
A722
21.8%
B576
17.4%
F380
11.5%
S347
10.5%
C212
 
6.4%
L114
 
3.4%
T104
 
3.1%
E71
 
2.1%
M33
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin23809
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s3289
13.8%
e2934
12.3%
r2396
 
10.1%
i1876
 
7.9%
n1759
 
7.4%
a1493
 
6.3%
o1102
 
4.6%
p1000
 
4.2%
h833
 
3.5%
c824
 
3.5%
Other values (18)6303
26.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII23809
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s3289
13.8%
e2934
12.3%
r2396
 
10.1%
i1876
 
7.9%
n1759
 
7.4%
a1493
 
6.3%
o1102
 
4.6%
p1000
 
4.2%
h833
 
3.5%
c824
 
3.5%
Other values (18)6303
26.5%

Product Name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1511
Distinct (%)45.6%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
Easy-staple paper
 
16
Staples
 
15
Staples in misc. colors
 
12
Staple envelope
 
11
Storex Dura Pro Binders
 
8
Other values (1506)
3250 

Length

Max length127
Median length36
Mean length37.0513285
Min length5

Characters and Unicode

Total characters122714
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique554 ?
Unique (%)16.7%

Sample

1st rowXerox 1967
2nd rowGlobal Deluxe Stacking Chair, Gray
3rd rowEasy-staple paper
4th rowPanasonic Kx-TS550
5th rowAdvantus 10-Drawer Portable Organizer, Chrome Metal Frame, Smoke Drawers

Common Values

ValueCountFrequency (%)
Easy-staple paper16
 
0.5%
Staples15
 
0.5%
Staples in misc. colors12
 
0.4%
Staple envelope11
 
0.3%
Storex Dura Pro Binders8
 
0.2%
Global Wood Trimmed Manager's Task Chair, Khaki8
 
0.2%
Staple remover8
 
0.2%
Logitech Desktop MK120 Mouse and keyboard Combo7
 
0.2%
Adjustable Depth Letter/Legal Cart7
 
0.2%
Sterilite Officeware Hinged File Box7
 
0.2%
Other values (1501)3213
97.0%

Length

2022-03-16T19:07:14.586017image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
xerox292
 
1.6%
x220
 
1.2%
202
 
1.1%
with190
 
1.0%
for183
 
1.0%
binders178
 
1.0%
avery173
 
0.9%
chair151
 
0.8%
black147
 
0.8%
phone114
 
0.6%
Other values (2479)16721
90.0%

Most occurring characters

ValueCountFrequency (%)
15134
 
12.3%
e11264
 
9.2%
r6984
 
5.7%
o6626
 
5.4%
a6392
 
5.2%
i6207
 
5.1%
l5439
 
4.4%
n5044
 
4.1%
s4940
 
4.0%
t4800
 
3.9%
Other values (74)49884
40.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter79386
64.7%
Uppercase Letter18619
 
15.2%
Space Separator15275
 
12.4%
Decimal Number5948
 
4.8%
Other Punctuation2415
 
2.0%
Dash Punctuation985
 
0.8%
Final Punctuation24
 
< 0.1%
Open Punctuation21
 
< 0.1%
Close Punctuation21
 
< 0.1%
Math Symbol9
 
< 0.1%
Other values (2)11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e11264
14.2%
r6984
 
8.8%
o6626
 
8.3%
a6392
 
8.1%
i6207
 
7.8%
l5439
 
6.9%
n5044
 
6.4%
s4940
 
6.2%
t4800
 
6.0%
c2950
 
3.7%
Other values (18)18740
23.6%
Uppercase Letter
ValueCountFrequency (%)
S2071
 
11.1%
C2026
 
10.9%
B1851
 
9.9%
P1616
 
8.7%
M1037
 
5.6%
D985
 
5.3%
A925
 
5.0%
T878
 
4.7%
F878
 
4.7%
L747
 
4.0%
Other values (16)5605
30.1%
Other Punctuation
ValueCountFrequency (%)
,1054
43.6%
/557
23.1%
"412
 
17.1%
.177
 
7.3%
&90
 
3.7%
'76
 
3.1%
#28
 
1.2%
%12
 
0.5%
!5
 
0.2%
;2
 
0.1%
Decimal Number
ValueCountFrequency (%)
11216
20.4%
0977
16.4%
2769
12.9%
4568
9.5%
3513
8.6%
5477
 
8.0%
8417
 
7.0%
9397
 
6.7%
6318
 
5.3%
7296
 
5.0%
Space Separator
ValueCountFrequency (%)
15134
99.1%
 141
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
-985
100.0%
Final Punctuation
ValueCountFrequency (%)
24
100.0%
Open Punctuation
ValueCountFrequency (%)
(21
100.0%
Close Punctuation
ValueCountFrequency (%)
)21
100.0%
Math Symbol
ValueCountFrequency (%)
+9
100.0%
Initial Punctuation
ValueCountFrequency (%)
8
100.0%
Other Number
ValueCountFrequency (%)
¾3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin98005
79.9%
Common24709
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e11264
 
11.5%
r6984
 
7.1%
o6626
 
6.8%
a6392
 
6.5%
i6207
 
6.3%
l5439
 
5.5%
n5044
 
5.1%
s4940
 
5.0%
t4800
 
4.9%
c2950
 
3.0%
Other values (44)37359
38.1%
Common
ValueCountFrequency (%)
15134
61.2%
11216
 
4.9%
,1054
 
4.3%
-985
 
4.0%
0977
 
4.0%
2769
 
3.1%
4568
 
2.3%
/557
 
2.3%
3513
 
2.1%
5477
 
1.9%
Other values (20)2459
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII122531
99.9%
None151
 
0.1%
Punctuation32
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15134
 
12.4%
e11264
 
9.2%
r6984
 
5.7%
o6626
 
5.4%
a6392
 
5.2%
i6207
 
5.1%
l5439
 
4.4%
n5044
 
4.1%
s4940
 
4.0%
t4800
 
3.9%
Other values (68)49701
40.6%
None
ValueCountFrequency (%)
 141
93.4%
é6
 
4.0%
¾3
 
2.0%
à1
 
0.7%
Punctuation
ValueCountFrequency (%)
24
75.0%
8
 
25.0%

Sales
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2623
Distinct (%)79.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean221.3814176
Minimum0.444
Maximum13999.96
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2022-03-16T19:07:14.976542image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.444
5-th percentile5.106
Q117.018
median53.81
Q3205.1057
95-th percentile907.643
Maximum13999.96
Range13999.516
Interquartile range (IQR)188.0877

Descriptive statistics

Standard deviation585.2575313
Coefficient of variation (CV)2.643661503
Kurtosis179.3055029
Mean221.3814176
Median Absolute Deviation (MAD)44.488
Skewness10.55472573
Sum733215.2552
Variance342526.3779
MonotonicityNot monotonic
2022-03-16T19:07:15.380526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.9620
 
0.6%
15.55214
 
0.4%
19.4413
 
0.4%
20.73612
 
0.4%
10.36811
 
0.3%
25.929
 
0.3%
32.46
 
0.2%
18.246
 
0.2%
6.485
 
0.2%
8.645
 
0.2%
Other values (2613)3211
97.0%
ValueCountFrequency (%)
0.4441
< 0.1%
0.5561
< 0.1%
0.991
< 0.1%
1.081
< 0.1%
1.1881
< 0.1%
1.1882
0.1%
1.2482
0.1%
1.3921
< 0.1%
1.4081
< 0.1%
1.441
< 0.1%
ValueCountFrequency (%)
13999.961
< 0.1%
11199.9681
< 0.1%
10499.971
< 0.1%
7999.981
< 0.1%
5443.961
< 0.1%
5199.961
< 0.1%
5083.961
< 0.1%
4799.9841
< 0.1%
4663.7361
< 0.1%
4416.1741
< 0.1%

Quantity
Real number (ℝ≥0)

Distinct14
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.766908213
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2022-03-16T19:07:15.625302image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum14
Range13
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.221776109
Coefficient of variation (CV)0.5898142412
Kurtosis1.793741495
Mean3.766908213
Median Absolute Deviation (MAD)1
Skewness1.223969076
Sum12476
Variance4.936289079
MonotonicityNot monotonic
2022-03-16T19:07:15.839218image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2781
23.6%
3759
22.9%
5441
13.3%
4398
12.0%
1337
10.2%
7190
 
5.7%
6173
 
5.2%
899
 
3.0%
980
 
2.4%
1018
 
0.5%
Other values (4)36
 
1.1%
ValueCountFrequency (%)
1337
10.2%
2781
23.6%
3759
22.9%
4398
12.0%
5441
13.3%
6173
 
5.2%
7190
 
5.7%
899
 
3.0%
980
 
2.4%
1018
 
0.5%
ValueCountFrequency (%)
148
 
0.2%
138
 
0.2%
127
 
0.2%
1113
 
0.4%
1018
 
0.5%
980
 
2.4%
899
 
3.0%
7190
5.7%
6173
 
5.2%
5441
13.3%

Discount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1564673913
Minimum0
Maximum0.8
Zeros1590
Zeros (%)48.0%
Negative0
Negative (%)0.0%
Memory size26.0 KiB
2022-03-16T19:07:16.073630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.2
Q30.2
95-th percentile0.7
Maximum0.8
Range0.8
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.2074291213
Coefficient of variation (CV)1.325701922
Kurtosis2.461983121
Mean0.1564673913
Median Absolute Deviation (MAD)0.2
Skewness1.699831702
Sum518.22
Variance0.04302684037
MonotonicityNot monotonic
2022-03-16T19:07:16.295493image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
01590
48.0%
0.21223
36.9%
0.7138
 
4.2%
0.8107
 
3.2%
0.469
 
2.1%
0.368
 
2.1%
0.639
 
1.2%
0.128
 
0.8%
0.519
 
0.6%
0.1516
 
0.5%
Other values (2)15
 
0.5%
ValueCountFrequency (%)
01590
48.0%
0.128
 
0.8%
0.1516
 
0.5%
0.21223
36.9%
0.368
 
2.1%
0.3211
 
0.3%
0.469
 
2.1%
0.454
 
0.1%
0.519
 
0.6%
0.639
 
1.2%
ValueCountFrequency (%)
0.8107
 
3.2%
0.7138
 
4.2%
0.639
 
1.2%
0.519
 
0.6%
0.454
 
0.1%
0.469
 
2.1%
0.3211
 
0.3%
0.368
 
2.1%
0.21223
36.9%
0.1516
 
0.5%

Profit
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2913
Distinct (%)88.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.21233986
Minimum-3839.9904
Maximum6719.9808
Zeros19
Zeros (%)0.6%
Negative620
Negative (%)18.7%
Memory size26.0 KiB
2022-03-16T19:07:16.588564image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-3839.9904
5-th percentile-52.39104
Q11.7632
median8.2968
Q328.315125
95-th percentile163.80095
Maximum6719.9808
Range10559.9712
Interquartile range (IQR)26.551925

Descriptive statistics

Standard deviation241.8643416
Coefficient of variation (CV)8.5729983
Kurtosis300.4536849
Mean28.21233986
Median Absolute Deviation (MAD)10.7688
Skewness8.217176714
Sum93439.2696
Variance58498.35974
MonotonicityNot monotonic
2022-03-16T19:07:16.863181image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
019
 
0.6%
6.220816
 
0.5%
9.331213
 
0.4%
7.257612
 
0.4%
5.443211
 
0.3%
3.62889
 
0.3%
12.44167
 
0.2%
15.5526
 
0.2%
114.93854
 
0.1%
9.0724
 
0.1%
Other values (2903)3211
97.0%
ValueCountFrequency (%)
-3839.99041
< 0.1%
-3399.981
< 0.1%
-2929.48451
< 0.1%
-2287.7821
< 0.1%
-1306.55041
< 0.1%
-1237.84621
< 0.1%
-1143.8911
< 0.1%
-1141.471
< 0.1%
-1049.34061
< 0.1%
-1002.78361
< 0.1%
ValueCountFrequency (%)
6719.98081
< 0.1%
5039.98561
< 0.1%
3919.98881
< 0.1%
2504.22161
< 0.1%
1906.4851
< 0.1%
1668.2051
< 0.1%
1453.12381
< 0.1%
1439.9761
< 0.1%
1379.9771
< 0.1%
1351.98961
< 0.1%

month_year
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
2020-12
462 
2020-09
459 
2020-11
459 
2020-10
298 
2020-06
245 
Other values (7)
1389 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters23184
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-04
2nd row2020-07
3rd row2020-10
4th row2020-09
5th row2020-09

Common Values

ValueCountFrequency (%)
2020-12462
13.9%
2020-09459
13.9%
2020-11459
13.9%
2020-10298
9.0%
2020-06245
7.4%
2020-05242
7.3%
2020-03238
7.2%
2020-07226
6.8%
2020-08218
6.6%
2020-04203
6.1%
Other values (2)262
7.9%

Length

2022-03-16T19:07:17.124631image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-12462
13.9%
2020-09459
13.9%
2020-11459
13.9%
2020-10298
9.0%
2020-06245
7.4%
2020-05242
7.3%
2020-03238
7.2%
2020-07226
6.8%
2020-08218
6.6%
2020-04203
6.1%
Other values (2)262
7.9%

Most occurring characters

ValueCountFrequency (%)
09015
38.9%
27193
31.0%
-3312
 
14.3%
11833
 
7.9%
9459
 
2.0%
6245
 
1.1%
5242
 
1.0%
3238
 
1.0%
7226
 
1.0%
8218
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number19872
85.7%
Dash Punctuation3312
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
09015
45.4%
27193
36.2%
11833
 
9.2%
9459
 
2.3%
6245
 
1.2%
5242
 
1.2%
3238
 
1.2%
7226
 
1.1%
8218
 
1.1%
4203
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
-3312
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common23184
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
09015
38.9%
27193
31.0%
-3312
 
14.3%
11833
 
7.9%
9459
 
2.0%
6245
 
1.1%
5242
 
1.0%
3238
 
1.0%
7226
 
1.0%
8218
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII23184
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
09015
38.9%
27193
31.0%
-3312
 
14.3%
11833
 
7.9%
9459
 
2.0%
6245
 
1.1%
5242
 
1.0%
3238
 
1.0%
7226
 
1.0%
8218
 
0.9%

Interactions

2022-03-16T19:06:58.726169image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:37.468044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:40.222755image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:47.451365image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:54.787613image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:57.050963image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:58.966108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:38.536839image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:40.444726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:52.199361image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:55.143342image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:57.300663image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:59.208999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:38.902807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:42.681814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:52.760765image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:55.556784image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:57.588828image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:59.482711image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:39.332536image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:43.390795image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:53.086993image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:55.910561image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:57.941968image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:59.742823image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:39.641279image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:43.611758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:53.471462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:56.223771image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:58.203827image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:07:00.019219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:39.942668image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:45.463221image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:54.216640image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:56.617181image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-03-16T19:06:58.473029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-03-16T19:07:17.580599image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-16T19:07:17.907393image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-16T19:07:18.208996image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-16T19:07:18.590931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-03-16T19:07:18.906651image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-03-16T19:07:00.742884image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-16T19:07:02.875783image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Row IDOrder IDOrder DateShip ModeCustomer IDCustomer NameSegmentCountryCityStatePostal CodeRegionProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitmonth_year
013CA-2017-1144122020-04-15Standard ClassAA-10480Andrew AllenConsumerUnited StatesConcordNorth Carolina28027SouthOFF-PA-10002365Office SuppliesPaperXerox 196715.55230.25.44322020-04
124US-2017-1569092020-07-16Second ClassSF-20065Sandra FlanaganConsumerUnited StatesPhiladelphiaPennsylvania19140EastFUR-CH-10002774FurnitureChairsGlobal Deluxe Stacking Chair, Gray71.37220.3-1.01962020-07
235CA-2017-1077272020-10-19Second ClassMA-17560Matt AbelmanHome OfficeUnited StatesHoustonTexas77095CentralOFF-PA-10000249Office SuppliesPaperEasy-staple paper29.47230.29.94682020-10
342CA-2017-1209992020-09-10Standard ClassLC-16930Linda CazamiasCorporateUnited StatesNapervilleIllinois60540CentralTEC-PH-10004093TechnologyPhonesPanasonic Kx-TS550147.16840.216.55642020-09
444CA-2017-1396192020-09-19Standard ClassES-14080Erin SmithCorporateUnited StatesMelbourneFlorida32935SouthOFF-ST-10003282Office SuppliesStorageAdvantus 10-Drawer Portable Organizer, Chrome Metal Frame, Smoke Drawers95.61620.29.56162020-09
572CA-2017-1144402020-09-14Second ClassTB-21520Tracy BlumsteinConsumerUnited StatesJacksonMichigan49201CentralOFF-PA-10004675Office SuppliesPaperTelephone Message Books with Fax/Mobile Section, 5 1/2" x 3 3/16"19.05030.08.76302020-09
676US-2017-1180382020-12-09First ClassKB-16600Ken BrennanCorporateUnited StatesHoustonTexas77041CentralOFF-BI-10004182Office SuppliesBindersEconomy Binders1.24830.8-1.93442020-12
777US-2017-1180382020-12-09First ClassKB-16600Ken BrennanCorporateUnited StatesHoustonTexas77041CentralFUR-FU-10000260FurnitureFurnishings6" Cubicle Wall Clock, Black9.70830.6-5.82482020-12
878US-2017-1180382020-12-09First ClassKB-16600Ken BrennanCorporateUnited StatesHoustonTexas77041CentralOFF-ST-10000615Office SuppliesStorageSimpliFile Personal File, Black Granite, 15w x 6-15/16d x 11-1/4h27.24030.22.72402020-12
985US-2017-1196622020-11-13First ClassCS-12400Christopher SchildHome OfficeUnited StatesChicagoIllinois60623CentralOFF-ST-10003656Office SuppliesStorageSafco Industrial Wire Shelving230.37630.2-48.95492020-11

Last rows

Row IDOrder IDOrder DateShip ModeCustomer IDCustomer NameSegmentCountryCityStatePostal CodeRegionProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitmonth_year
33029968CA-2017-1538712020-12-11Standard ClassRB-19435Richard BiernerConsumerUnited StatesPlainfieldNew Jersey7060EastOFF-BI-10004209Office SuppliesBindersFellowes Twister Kit, Gray/Clear, 3/pkg40.20050.018.09002020-12
33039969CA-2017-1538712020-12-11Standard ClassRB-19435Richard BiernerConsumerUnited StatesPlainfieldNew Jersey7060EastOFF-BI-10004600Office SuppliesBindersIbico Ibimaster 300 Manual Binding System735.98020.0331.19102020-12
33049970CA-2017-1538712020-12-11Standard ClassRB-19435Richard BiernerConsumerUnited StatesPlainfieldNew Jersey7060EastOFF-AP-10003622Office SuppliesAppliancesBravo II Megaboss 12-Amp Hard Body Upright, Replacement Belts, 2 Belts per Pack22.75070.06.59752020-12
33059982CA-2017-1635662020-08-03First ClassTB-21055Ted ButterfieldConsumerUnited StatesFairfieldOhio45014EastOFF-LA-10004484Office SuppliesLabelsAvery 47616.52050.25.36902020-08
33069988CA-2017-1636292020-11-17Standard ClassRA-19885Ruben AusmanCorporateUnited StatesAthensGeorgia30605SouthTEC-AC-10001539TechnologyAccessoriesLogitech G430 Surround Sound Gaming Headset with Dolby 7.1 Technology79.99010.028.79642020-11
33079989CA-2017-1636292020-11-17Standard ClassRA-19885Ruben AusmanCorporateUnited StatesAthensGeorgia30605SouthTEC-PH-10004006TechnologyPhonesPanasonic KX - TS880B Telephone206.10050.055.64702020-11
33089991CA-2017-1212582020-02-26Standard ClassDB-13060Dave BrooksConsumerUnited StatesCosta MesaCalifornia92627WestFUR-FU-10000747FurnitureFurnishingsTenex B1-RE Series Chair Mats for Low Pile Carpets91.96020.015.63322020-02
33099992CA-2017-1212582020-02-26Standard ClassDB-13060Dave BrooksConsumerUnited StatesCosta MesaCalifornia92627WestTEC-PH-10003645TechnologyPhonesAastra 57i VoIP phone258.57620.219.39322020-02
33109993CA-2017-1212582020-02-26Standard ClassDB-13060Dave BrooksConsumerUnited StatesCosta MesaCalifornia92627WestOFF-PA-10004041Office SuppliesPaperIt's Hot Message Books with Stickers, 2 3/4" x 5"29.60040.013.32002020-02
33119994CA-2017-1199142020-05-04Second ClassCC-12220Chris CortesConsumerUnited StatesWestminsterCalifornia92683WestOFF-AP-10002684Office SuppliesAppliancesAcco 7-Outlet Masterpiece Power Center, Wihtout Fax/Phone Line Protection243.16020.072.94802020-05